Brooklyn is one of the most diverse places in New York. Here live over 30% of NY population. However, when it comes to vegetarian food, not many places offer good deals. Vegetarianism is more welcome among young population. In this project I’ll come up with some possible location options for the venue.
The problem to solve:
Open a vegetarian restaurant of the mid-range price category in Brooklyn. In specific, I decided to take a baseline location point the NYC College of Technology. The place is close to subway stations (as will be presented below), and has not got many dining venues so far. Therefore, the stakeholders are mostly young people preferring vegetarian food and local employees in vicinity.
In the framework of this project, I’ll be using several data sources, including:
These will be used to build the neighborhood profile in vicinity and better identify if the area worth starting a business at. For instance, NYC real estate data will give the first insights on the price range to buy a place if needed, as well as will give some insights on the solvency of the local population. While foursquare data will help build the profile of the restaurants in vicinity and identify some potential competitors. University locations will help determine if the starting point location (to be discussed in the main report methodology section) presents the point of university agglomeration. Subway stations will be used to illustarte good commute options.
From the sources above, with exception of foursqaure, was downloaded as shapefiles and, since not everywhere JSON option was working, lateron converted to csv using mygeodata service: https://mygeodata.cloud/converter/shp-to-csv
The purpose of this project is to identify the possible locations for a vegetarian restaurant in Brooklyn, NYC. The project presents an active use of spatial data from NYU Spatial data repository.
The initial prespotion for research is mid-range real estate pricings bring mid-range restaurants category. This assumption was tested in the course of research. At the first stage, the geodata for NY real estate will be taken and segmented (using k-means clustering) to map the distribution of building types sold in 2016. The data is to some extent outdated, yet for the purposes of this project, given no major price shifts in the previous years, the data will show the structural distribution of the sales objects.
Once done, the clusters will be mapped using the folio library.
Then, we'll use the foursquare data to get the list and details of some venues in vicinity of 2 km. This will allow us to have a more comoprehensive look and build the places profile based on their price tier, location, tips, likes, rating and other similar charachteristics. Once done, the second clastrization will be made, this time of the venues in question. This step will help better locate the competitors. The data will be illustarted visually respecitvely.
On the third stage I'll examine if any other educational instituions are in vicinity and how good the commute is. Once that all is put together, some street names for possible restaurant openning will be presented.
The starting point for the location analysis is NYC College of Technology. This place is taken with the view to reaching yough people and employees who work nearby, thus covering the target audience.
#import necessary libraries
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import numpy as np
import json # library to handle JSON files
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library
print('Libraries imported.')
with open('ny_data/newyork_data.json') as json_data:
newyork_data = json.load(json_data)
newyork_data['features'][0]
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude']
# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
for data in newyork_data['features']:
borough = neighborhood_name = data['properties']['borough']
neighborhood_name = data['properties']['name']
neighborhood_latlon = data['geometry']['coordinates']
neighborhood_lat = neighborhood_latlon[1]
neighborhood_lon = neighborhood_latlon[0]
neighborhoods = neighborhoods.append({'Borough': borough,
'Neighborhood': neighborhood_name,
'Latitude': neighborhood_lat,
'Longitude': neighborhood_lon}, ignore_index=True)
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
len(neighborhoods['Borough'].unique()),
neighborhoods.shape[0]) )
neighborhoods['Borough'].unique()
brooklyn_data = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
brooklyn_data.head()
#open the real estate data file
with open('ny_data/nyu-2451-34678-geojson.json') as json_data:
newyork_sale_ft_data = json.load(json_data)
sales_data = json_normalize(newyork_sale_ft_data['features'])
sales_data.columns[0].split('.')[1]
columns = sales_data.columns
cols = []
for i in columns:
if len(i.split('.')) > 1:
cols.append(i.split('.')[1])
else:
cols.append(i.split('.')[0])
sales_data.columns = cols
cols
sales_data.sample()
Let's first clasterize neighbourhoods of Brooklyn by real estate sales data to get some first insights on the neighbourhood economic activity
In data, borough 3 is Brooklyn, and data will be taken only where sale price > 10 USD, i.e. usuable buildings.
brooklyn_property_sales = sales_data[(sales_data['borough'] == 3) & (sales_data['usable'] == 'Y')]
brooklyn_property_sales = brooklyn_property_sales[['borough','nbhd', 'address', 'zip', 'lat', 'long', 'bldg_ctgy', 'bldg_cls_s', 'tax_cls_s', 'land_sqft','price']]
brooklyn_property_sales.bldg_cls_s.value_counts()
# make onehot encodding for selected columns
brooklyn_property_sales = pd.get_dummies(brooklyn_property_sales, prefix='CAT_', columns= ['bldg_ctgy', 'bldg_cls_s',] )
brooklyn_property_sales.head()
address = 'Brooklyn, NY'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate Brooklyn borough are {}, {}.'.format(latitude, longitude))
# create map of Brooklyn using latitude and longitude values
map_brooklyn = folium.Map(location=[latitude, longitude], zoom_start=11)
# add markers to map
for lat, lng, label in zip(brooklyn_property_sales['lat'], brooklyn_property_sales['long'], brooklyn_property_sales['nbhd']):
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=5,
popup=label,
color='blue',
fill=True,
fill_color='#3186cc',
fill_opacity=0.7,
parse_html=False).add_to(map_brooklyn)
#add New York City College of Technology to the map (big dark circle)
label = folium.Popup('New York City College of Technology', parse_html=True)
folium.CircleMarker(
[40.695457, -73.9864678851903],
radius=20,
popup=label,
color='dark green',
fill=True,
fill_color='dark green',
fill_opacity=0.7,
parse_html=False).add_to(map_brooklyn)
map_brooklyn